An annotated corpus for the analysis of VP ellipsis
نویسندگان
چکیده
Verb Phrase Ellipsis (VPE) has been studied in great depth in theoretical linguistics, but empirical studies of VPE are rare. We extend the few previous corpus studies with an annotated corpus of VPE in all 25 sections of the Wall Street Journal corpus (WSJ) distributed with the Penn Treebank. We annotated the raw files using a stand-off annotation scheme that codes the auxiliary verb triggering the elided verb phrase, the start and end of the antecedent, the syntactic type of antecedent (VP, TV, NP, PP or AP), and the type of syntactic pattern between the source and target clauses of the VPE and its antecedent. We found 487 instances of VPE (including predicative ellipsis, antecedent-contained deletion, comparative constructions, and pseudo-gapping) plus 67 cases of related phenomena such as do so anaphora. Inter-annotator agreement was high, with a 0.97 average F-score for three annotators for one section of the WSJ. Our annotation is theory neutral, and has better coverage than earlier efforts that relied on automatic methods, e.g. simply searching the parsed version of the Penn Treebank for empty VP’s achieves a high precision (0.95) but low recall (0.58) when compared with our manual annotation. The distribution of VPE source–target patterns deviates highly from the standard examples found in the theoretical linguistics literature on VPE, once more underlining the value of corpus studies. The resulting corpus will be useful for studying VPE phenomena as well as for evaluating natural language processing systems equipped with ellipsis resolution algorithms, and we propose evaluation measures for VPE detection and VPE antecedent selection. The stand-off annotation is freely available for research purposes.
منابع مشابه
A Conversation Analysis of Ellipsis and Substitution in Global Business English Textbooks
Despite the body of research on textbook evaluation from the discourse analysis perspective, cohesive devices have rarely been analyzed in English for Specific Purposes (ESP) textbooks. The acquisition and use of cohesive devices is inherent to naturalistic communication, including business interactions. Hence, L2 learners of business English should be exposed to these devices through cohesion-...
متن کاملAn Algorithm for VP Ellipsis
An algorithm is proposed to determine antecedents for VP ellipsis. The algorithm eliminates impossible antecedents, and then imposes a preference ordering on possible antecedents. The algorithm performs with 94% accuracy on a set of 304 examples of VP ellipsis collected from the Brown Corpus. The problem of determining antecedents for VP ellipsis has received little attention in the literature,...
متن کاملAn Empirical Approach to VP Ellipsis
This paper reports on an empirically based system that automatically resolves VP ellipsis in the 644 examples identified in the parsed Penn Treebank. The results reported here represent the first systematic corpus-based study of VP ellipsis resolution, and the performance of the system is comparable to the best existing systems for pronoun resolution. The methodology and utilities described can...
متن کاملA Contrastive Study of Persian and English Written Discourse: Ellipsis in Realistic Novels
This study aspires to examine the concept of ellipsis by comparing and contrasting English and Persian written texts. For this purpose, three Persian novels and three English ones were selected. These novels were analyzed carefully; they were compared and contrasted for types and amount of ellipsis used, through a Chi-square analysis. The results of the data analysis revealed that various t...
متن کاملFounded by Benjamin Franklin in 1740 The Institute For Research In Cognitive
The central claim of this dissertation is that an elliptical VP is a proform. This claim has two primary consequences: first, the elliptical VP can have no internal syntactic structure. Second, the interpretation of VP ellipsis must be governed by the same general conditions governing other proforms, such as pronouns. The basic condition governing the interpretation of a proform is that it must...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 45 شماره
صفحات -
تاریخ انتشار 2011